Sys.setlocale(category = "LC_ALL", locale = "Greek")
library(here) # [CRAN] A replacement for 'file.path()', locating the files relative to the project root
library(tidyverse) # [CRAN] Easily install and load the 'Tidyverse'
library(cowplot) # [CRAN] Streamlined plot theme and plot annotations for 'ggplot2'
library(RColorBrewer) # [CRAN] ColorBrewer Palettes
library(PerformanceAnalytics) # [CRAN] Collection of econometric functions for performance and risk analysis
library(MicrobeR) # [github::jbisanz/MicrobeR] Handy functions for downstream visualization fo microbiome data
library(phyloseq) # [Bioconductor] Handling and analysis of high-throughput microbiome census data
library(factoextra) # [CRAN] Extract and visualize the results of multivariate data analyses
library(lmerTest) # [CRAN] Fits and tests in linear mixed effects models
library(lsr) # [CRAN] # Companion to "Learning Statistics with R"
library(EMAtools) # [CRAN] # Data Management Tools for Real-Time Monitoring/Ecological Momentary Assessment Data
library(DT) # [CRAN] An R interface to the DataTables library
library(Maaslin2)# [bitbucket::biobakery/maaslin2] Multivariate association analysis for microbiome data
source(here("code", "functions", "maaslin2_heatmap.R"))
load(here("data", "qiime2R", "phyloseq.RData"))
# Extract feature table, taxonomy and metadata from the phyloseq object
count_tab <- as.data.frame(otu_table(phyloseq))
tax_tab <- tax_table(phyloseq) %>% as("matrix") %>% as.data.frame()
metadata <- data.frame(sample_data(phyloseq), check.names = FALSE)
metadata <- metadata %>%
rownames_to_column() %>%
# hyphens in column names cause problems. We use underscore instead
rename_all(~gsub("-", "_", .x)) %>%
# convert FishID and NetPen to character so that they can be correctly recognized as random effects
mutate(FishID = as.character(FishID), NetPen = as.character(NetPen))
First of all, we get the best taxonomic annotation for each feature. Starting at the phylum level, a higher level taxonomic rank will be assigned to a lower level taxonomic rank if the lower taxonomic rank is null or contains strings like “uncultured”, “Ambiguous”, or “metagenome”.
tab_species <- Summarize.Taxa(count_tab, tax_tab)$Species %>%
rownames_to_column("tax") %>%
separate(tax, sep = ";", c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")) %>%
# the following 5 lines format the taxonomy of each feature to the best hit
mutate(Class = ifelse(Class == "NA"|grepl("uncultured|Ambiguous|metagenome", Class), Phylum, Class),
Order = ifelse(Order == "NA"|grepl("uncultured|Ambiguous|metagenome", Order), Class, Order),
Family = ifelse(Family == "NA"|grepl("uncultured|Ambiguous|metagenome", Family), Order, Family),
Genus = ifelse(Genus == "NA"|grepl("uncultured|Ambiguous|metagenome", Genus), Family, Genus),
Species = ifelse(Species == "NA"|grepl("uncultured|Ambiguous|metagenome", Species), Genus, Species)) %>%
select(-(Kingdom:Family))
Then we filter genera consisting of only one species in the present data set.
index <- tab_species %>%
filter(grepl("g__", Genus)) %>%
group_by(Genus) %>%
nest() %>%
mutate(nrow = map_int(data, ~nrow(.x))) %>%
filter(nrow == 1) %>%
unnest() %>%
filter(grepl("s__", Species))
For genera consisting of only one species, we replace genus names with species names.
tab_genus <- tab_species %>%
mutate(Genus = ifelse(grepl(paste(index$Genus, collapse = "|"), Genus), Species, Genus)) %>%
select(-Species) %>%
# remove the prefix from genus/species names
mutate_if(is.character, ~gsub("g__|s__", "", .x)) %>%
# the following 2 lines merge rows with the same taxonomy
group_by(Genus) %>%
summarise_all(sum) %>%
column_to_rownames("Genus")
For the multivariate association analysis, we’ll treat FishID and NetPen as random effects. Diet and Sample origin are the two main fixed effects we’re interested in. We’re also interested in potential associations between the microbial clades and other sample metadata collected from the same fish. Due to the small number of observations, we’ll limit the association tests to some of the sample metadata, including the distal intestine weight (DISI), histological scores of the distal intestine and expression levels of genes indicative of immune responses and barrier functions.
Histological examination of distal intestine showed various degrees of inflammation in different fish with no obivious diet effects. The morphological changes resemble those commonly observed in salmonid fed soybean meal as illustated below: shortening and fusion of mucosal folds, cellular infiltration within the lamina propria and submucosa, reduced supravacuolization within enterocytes and nucleus position disparity. Specifically, the histological sections were evaluated paying attentions to the changes in the mucosal fold height (mfh), supravacuolization within enterocytes (snv), submucosal cellularity (smc) and lamina propria cellularity (lpc).
As shown below, histological scores under different evaluation categories for the same fish agree with each other in most cases. The mucosal fold height (mfh) and supravacuolization within enterocytes (snv) showed the least and most variable changes, whereas cellular infiltration within the submucosa (smc) and lamina propria (lpc) showed intermediate level of changes. To avoid the multicollinearity, we’ll select one of them for the association analysis.
# Get histological scores from the metadata
df_histo <- filter(metadata, SampleOrigin == "Mucosa") %>%
select(FishID, contains("Histo")) %>%
rename_all(~gsub("Histology_", "", .x)) %>%
gather(key = "item", value = "score", -FishID) %>%
mutate(item = factor(item, c("mfh", "smc", "lpc", "snv")), score = factor(score, unique(.$score)))
# Plot histological scores for each fish
ggplot(df_histo, aes(x = item, y = score)) +
geom_point(colour = "blue") +
geom_line(aes(group = 1), colour = "blue") +
facet_wrap(~FishID, ncol = 6) +
labs(x = "Histological characteristics evaluated", y = "Histological scores") +
cowplot::theme_cowplot() +
theme(panel.grid.major.y = element_line(size = 0.5, linetype = 'dashed', colour = "black"))
The distibution of histological scores under each evaluation category can be quite unbalanced, which is chanlleging for running a proper association analysis. Let’s look at the histogram of histological scores under each evaluation category.
ggplot(df_histo, aes(score)) +
geom_histogram(stat = "count") +
facet_wrap(~item) +
scale_y_continuous(limits = c(0, 30), breaks = 0:6*5, expand = expansion(mult = c(0, 0.05))) +
theme_cowplot() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.y = element_line(size = 0.5, linetype = 'dashed', colour = "black"))
The distibution of histological scores was indeed quite unbalanced. We’ll aggregate histological scores to binary data (normal VS. abnormal) so that data are more balanced. As microbiome data are sparse, aggregation of histologcial scores also increases the number of non-zero observations within each group, making association tests more reliable.
metadata <- metadata %>%
mutate_at(vars(contains("Histo")), ~gsub("Mild|Moderate|Marked|Severe", "Abnormal", .x)) %>%
mutate_at(vars(contains("Histo")), ~factor(.x, levels = c("Normal", "Abnormal")))
Let’s look at the distibution of histological scores after data aggregation.
filter(metadata, SampleOrigin == "Mucosa") %>%
select(FishID, contains("Histo")) %>%
rename_all(~gsub("Histology_", "", .x)) %>%
gather(key = "item", value = "score", -FishID) %>%
mutate(item = factor(item, c("mfh", "smc", "lpc", "snv")), score = factor(score, unique(.$score))) %>%
ggplot(aes(score)) +
geom_histogram(stat="count") +
facet_wrap(~item) +
scale_y_continuous(limits = c(0, 30), breaks = 0:6*5, expand = expansion(mult = c(0, 0.05))) +
theme_bw(base_size = 14) +
theme(panel.grid.major.y = element_line(size = 0.5, linetype = 'dashed', colour = "black"))
After data aggregation, lpc and snv showed a much balanced distribution. As cellular infiltration within the lamina propria (lpc) is a direct indication of gut inflammation, we’ll use it for the association analysis.
In addition to histolgocial examination, some of the marker genes of the gut inflammation in Atlantic salmon were also profiled by qPCR assays. Let’s plot the expression levels of these genes in fish assigned as normal or abnormal regarding the cellular infiltration within the lamina propria (lpc).
# Get the gene expression data on immune responses and convert to the long format
metadata_imm <- filter(metadata, SampleOrigin == "Mucosa") %>%
rename_all(~gsub("qPCR_", "", .x)) %>%
rename_all(~gsub("Histology_", "", .x)) %>%
select(NetPen, lpc, myd88:il4) %>%
gather(gene, quantity, -c(lpc, NetPen)) %>%
mutate(gene = factor(gene, unique(.$gene)), lpc = factor(lpc, unique(.$lpc)))
# Make boxplots showing difference in the gene expression level under different histological scores
ggplot(metadata_imm, aes(x = lpc, y = quantity)) +
geom_boxplot(outlier.shape = NA) +
geom_point(aes(fill = lpc), size = 2, shape = 21, position = position_jitterdodge(0.2)) +
facet_wrap(~gene, ncol = 4, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.1))) +
labs(x = "Histological score of lamina propria cellularity", y = "Quantity") +
theme_bw(base_size = 14) +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none")
Next, let’s find out differential expressed genes.
de_imm <- metadata_imm %>%
group_by(gene) %>%
nest() %>%
# fit linear mixed effects models
mutate(lmm = map(data, ~lmer(log(quantity) ~ lpc + (1|NetPen), data = .x)),
# mark models with singular fits
sf = map_chr(lmm, ~ifelse(is.null(.x@optinfo$conv$lme4$messages), "no", "yes")),
# significance testing for the fixed effect using Kenward-Roger approximation
anv = map(lmm, ~anova(.x, ddf = "Kenward-Roger")),
p_lmm = map_dbl(anv, ~.x["lpc", "Pr(>F)"]),
# get effect size: cohen's D
es_lmm = map2_dbl(lmm, data, ~lme.dscore(mod = .x, data = .y, type = "lme4")[1, "d"]),
# run Welch t-test
lm = map(data, ~t.test(log(quantity) ~ lpc, data = .x)),
p_lm = map_dbl(lm, ~.x$p.value),
# get effect size: cohen's D
es_lm = map_dbl(data, ~cohensD(log(quantity) ~ lpc, data = .x, , method = "unequal"))) %>%
select(gene, sf, p_lmm, p_lm, es_lmm, es_lm) %>%
ungroup() %>%
mutate(p = ifelse(sf == "no", p_lmm, p_lm),
p_adj = p.adjust(p, method = "fdr"),
"Cohen's D" = ifelse(sf == "no", es_lmm, es_lm)) %>%
filter(p_adj < 0.05) %>%
mutate_if(is.numeric, ~ifelse(.x < 0.001, "< 0.001", round(.x, 3)))
datatable(de_imm, options = list(columnDefs = list(list(className = 'dt-left', targets = c(1:9)))))
The expression levels of these maker genes are often correlated. Let’s check the correlation between the differential expressed genes.
# Subset metadata to contain differential expressed genes only
metadata_immSig <- filter(metadata, SampleOrigin == "Mucosa") %>%
column_to_rownames("FishID") %>%
rename_all(~gsub("qPCR_", "", .x)) %>%
select(as.character(de_imm$gene))
# Make correlation plot
chart.Correlation(metadata_immSig, histogram = TRUE, pch = 19)
To avoid multicollinearity and to reduce the number of association testing, we’ll run principle component analysis (PCA) and use the first principle component (PC) for the association analysis, which explains most of the variance in the data. As the expression levels of some of the genes are quite skewed, we’ll log transform the data before running PCA.
# Log transform the data
metadata_immSigLog <- rownames_to_column(metadata_immSig, "FishID") %>%
mutate_if(is.numeric, ~log(.x)) %>%
column_to_rownames("FishID")
# Check the data distribution after data transformation
chart.Correlation(metadata_immSigLog, histogram = TRUE, pch = 19)
Now we run the PCA. We can see that the PC1 explains most of the variance.
# Principle component analysis
pca_imm <- prcomp(metadata_immSigLog, scale = TRUE)
# Scree plot
fviz_eig(pca_imm)
Plot contributions of variables on the PC1.
fviz_contrib(pca_imm, choice="var", axes = 1)
PCA biplot.
# Subset metadata for the biplot annotation
pca_ann <- filter(metadata, SampleOrigin == "Mucosa")
# calculate variance
eigs_imm <- pca_imm$sdev^2
# make biplot
fviz_pca_biplot(
pca_imm,
select.var = list(name = NULL, cos2 = NULL, contrib = 9),
col.ind = as.factor(pca_ann$Histology_lpc),
palette = "Dark2",
pointsize = 2,
invisible = "quali", # hide centroid
repel = TRUE,
label = "var",
title = "",
legend.title = "Lamina propria cellularity") +
labs(x = paste0("PC1 (", round(100 * eigs_imm[1] / sum(eigs_imm), 1), "%)"),
y = paste0("PC2 (", round(100 * eigs_imm[2] / sum(eigs_imm), 1), "%)"))
Extract PC1.
imm_pc1 <- pca_imm$x %>%
as.data.frame() %>%
select(PC1) %>%
rename(qPCR_immune_response = PC1) %>%
rownames_to_column("FishID") %>%
uncount(2)
As shown below, the expression of barrier function relevant genes showed strong correlations.
# Subset metadata
metadata_barr <- filter(metadata, SampleOrigin == "Mucosa") %>%
column_to_rownames("FishID") %>%
rename_all(~gsub("qPCR_", "", .x)) %>%
select(muc2:cdh1)
# Correlation plot
chart.Correlation(metadata_barr, histogram = TRUE, pch = 19)
Again, we’ll run PCA and use the PC1 for the association analysis.
pca_barr <- prcomp(metadata_barr, scale = TRUE)
# Scree plot
fviz_eig(pca_barr)
Plot contributions of variables on the PC1
fviz_contrib(pca_barr, choice="var", axes = 1)
PCA biplot
# calculate variance
eigs_barr <- pca_barr$sdev^2
# make pca biplot
fviz_pca_biplot(
pca_barr,
select.var = list(name = NULL, cos2 = NULL, contrib = 5),
col.ind = as.factor(pca_ann$Histology_lpc),
palette = "Dark2",
pointsize = 2,
invisible = "quali",
repel = TRUE,
label = "var",
title = "",
legend.title = "Lamina propria cellularity") +
labs(x = paste0("PC1 (", round(100 * eigs_barr[1] / sum(eigs_barr), 1), "%)"),
y = paste0("PC2 (", round(100 * eigs_barr[2] / sum(eigs_barr), 1), "%)"))
Extract PC1.
barr_pc1 <- pca_barr$x %>%
as.data.frame() %>%
select(PC1) %>%
rename(qPCR_barrier_function = PC1) %>%
rownames_to_column("FishID") %>%
uncount(2)
Add the extracted principle components to the metadata.
metadata <- bind_cols(metadata, imm_pc1, barr_pc1) %>%
mutate(Diet = factor(Diet, c("REF", "IM"))) %>%
rename(Sample_origin = SampleOrigin) %>%
column_to_rownames("rowname")
# Check if the rows were correctly matched during the table merging
identical(metadata$FishID, metadata$FishID1)
## [1] TRUE
identical(metadata$FishID, metadata$FishID2)
## [1] TRUE
Define the fixed and random effects of interest.
# Fixed effects
fixef <- c("Diet", "Sample_origin", "DISI", "Histology_lpc", "qPCR_immune_response", "qPCR_barrier_function")
# Random effects
ranef <- c("FishID", "NetPen")
Run multivariate association analysis.
fit <- Maaslin2(input_data = tab_genus,
input_metadata = metadata,
output = here("data", "maaslin2"),
min_abundance = 0.0001,
min_prevalence = 0.25,
max_significance = 0.25,
normalization = "TSS",
transform = "LOG",
analysis_method = "LM",
random_effects = ranef,
fixed_effects = fixef,
correction = "BH",
standardize = TRUE,
plot_heatmap = TRUE,
plot_scatter = TRUE)
Merge MaAsLin2 outputs, feature table and metadata for plotting.
# Get features with significant associations for each metadata tested
sig_result <- filter(fit$results, qval <= 0.25) %>%
mutate(N.not.zero = paste0(N.not.zero, "/", N)) %>%
arrange(feature)
# Replicate metadata rows
metadata_rep <- rownames_to_column(metadata, "SampleID") %>%
uncount(nrow(sig_result)) %>%
arrange(SampleID)
# Merge the collapsed feature table, metadata and features with significant associations
tab_sigAss <- tab_genus %>%
rownames_to_column("feature") %>%
mutate_if(is.numeric, ~.x/sum(.x)) %>%
filter(feature %in% unique(sig_result$feature)) %>%
left_join(count(sig_result, feature), by = "feature") %>%
uncount(n) %>%
arrange(feature) %>%
bind_cols(sig_result, .) %>%
gather(SampleID, abundance, "AqFl2-01":"AqFl2-72") %>%
arrange(SampleID) %>%
bind_cols(metadata_rep, .) %>%
mutate(qval = ifelse(qval < 0.001, "< 0.001", round(qval, 3)),
coef = formatC(coef, format = "e", digits = 1),
ann_boxplot = paste("FDR:", qval),
ann_scatter = paste(paste("FDR:", qval),
"\n",
paste("coefficient:", coef),
"\n",
paste("N.not.zero:", N.not.zero)))
Make a customized heatmap using a modified function from the MaAsLin2 package.
hmp <- maaslin2_heatmap(output_results = here("data", "maaslin2", "all_results.tsv"),
first_n = length(unique(sig_result$feature)), # plot all significant associations
cell_value = "qval",
data_label = "data",
metadata_label = "metadata",
title = FALSE,
legend_title = FALSE,
legend_title_position = "leftcenter-rot", # topleft, topcenter, leftcenter-rot or lefttop-rot
color = c("blue", "grey90", "red"),
board_line_col = "white",
colnames_rotate = 90,
colnames_fontsize = 12,
rownames_fontsize = 10,
italize_rownames = TRUE)
hmp
Make boxplot for sample origin effect, color points by Diet.
# Add number of non-zero observation to text annotation
ann_origin <- filter(tab_sigAss, metadata == "Sample_origin" ) %>%
group_by(feature, Sample_origin) %>%
summarize(N = n(), N_nonzero_origin = paste0(sum(abundance != 0), "/", N)) %>%
mutate(ann_origin = paste0(paste0("N.not.zero(", Sample_origin, "): ", N_nonzero_origin))) %>%
group_by(feature) %>%
summarize(n = sum(N), ann_origin = paste(ann_origin, collapse = "\n")) %>%
uncount(n) %>%
arrange(feature) %>%
select(feature, ann_origin)
# Make a dataframe for plotting
sigAss_origin <- filter(tab_sigAss, metadata == "Sample_origin" ) %>%
arrange(feature) %>%
bind_cols(ann_origin) %>%
mutate(ann_boxplot = paste0(ann_boxplot, "\n", ann_origin))
# Make plots
ggplot(sigAss_origin, aes(x = Sample_origin, y = abundance)) +
geom_boxplot(width = 0.5, outlier.shape = NA) +
geom_jitter(aes(colour = Diet), shape = 16, position = position_jitter(0.2)) +
geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 2.5, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 5, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.6)),
labels = scales::percent_format(accuracy = 0.1)) +
labs(x = "Sample origin", y = "Relative abundance") +
scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(3,4)]) +
theme_bw() +
theme(legend.position = "top", strip.text = element_text(face = "italic"))
Make boxplot for Diet effect, color points by sample origin.
# Add number of non-zero observation to text annotation
ann_diet <- filter(tab_sigAss, metadata == "Diet" ) %>%
group_by(feature, Diet) %>%
summarize(N = n(), N_nonzero_diet = paste0(sum(abundance != 0), "/", N)) %>%
mutate(ann_diet = paste0(paste0("N.not.zero(", Diet, "): ", N_nonzero_diet))) %>%
group_by(feature) %>%
summarize(n = sum(N), ann_diet = paste(ann_diet, collapse = "\n")) %>%
uncount(n) %>%
arrange(feature) %>%
select(feature, ann_diet)
# Make a dataframe for plotting
sigAss_diet <- filter(tab_sigAss, metadata == "Diet" ) %>%
arrange(feature) %>%
bind_cols(ann_diet) %>%
mutate(ann_boxplot = paste0(ann_boxplot, "\n", ann_diet))
# Make plots
ggplot(sigAss_diet, aes(x = Diet, y = abundance)) +
geom_boxplot(width = 0.5, outlier.shape = NA) +
geom_jitter(aes(colour = Sample_origin), shape = 16, position = position_jitter(0.2)) +
geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 5, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.6)),
labels = scales::percent_format(accuracy = 0.1)) +
labs(y = "Relative abundance", colour = "Sample origin") +
scale_fill_brewer(palette = "Dark2") +
theme_bw() +
theme(legend.position = "top", strip.text = element_text(face = "italic"))
Make boxplot for hisstological scores, color points by Sample origin.
# Add number of non-zero observation to text annotation
ann_lpc <- filter(tab_sigAss, metadata == "Histology_lpc" ) %>%
group_by(feature, Histology_lpc) %>%
summarize(N = n(), N_nonzero_lpc = paste0(sum(abundance != 0), "/", N)) %>%
mutate(ann_lpc = paste0(paste0("N.not.zero(", Histology_lpc, "): ", N_nonzero_lpc))) %>%
group_by(feature) %>%
summarize(n = sum(N), ann_lpc = paste(ann_lpc, collapse = "\n")) %>%
uncount(n) %>%
arrange(feature) %>%
select(feature, ann_lpc)
# Make a dataframe for plotting
sigAss_lpc <- filter(tab_sigAss, metadata == "Histology_lpc" ) %>%
arrange(feature) %>%
bind_cols(ann_lpc) %>%
mutate(ann_boxplot = paste0(ann_boxplot, "\n", ann_lpc))
# Make plots
ggplot(sigAss_lpc, aes(x = Histology_lpc, y = abundance)) +
geom_boxplot(width = 0.5, outlier.shape = NA) +
geom_jitter(aes(colour = Sample_origin), shape = 16, position = position_jitter(0.2)) +
geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 2, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)),
labels = scales::percent_format(accuracy = 0.1)) +
labs(x = "Lamina propria cellularity in distal gut", y = "Relative abundance",
colour = "Sample origin") +
scale_fill_brewer(palette = "Dark2") +
theme_bw() +
theme(strip.text = element_text(face = "italic"))
Make scatter plot for DISI, color points by sample origin.
filter(tab_sigAss, metadata == "DISI" ) %>%
ggplot(aes(x = DISI, y = abundance)) +
geom_point(aes(colour = Sample_origin), size = 2) +
geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 2, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)),
labels = scales::percent_format(accuracy = 0.1)) +
labs(y = "Relative abundance", colour = "Sample origin") +
scale_fill_brewer(palette = "Dark2") +
theme_bw() +
theme(strip.text = element_text(face = "italic"))
Make scatter plot for PC1 of immune response genes, color points by sample origin. Note that as the PC1 increases, the expression levels of immune response genes decrease (see PCA biplot above). Hence, a positive coefficient from the MaAsLin2 outputs is indicative of negative correlation between the microbial clades and expression levels of immune genes, and vice versa.
filter(tab_sigAss, metadata == "qPCR_immune_response" ) %>%
ggplot(aes(x = qPCR_immune_response, y = abundance)) +
geom_point(aes(colour = Sample_origin), size = 2) +
geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 3, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)),
labels = scales::percent_format(accuracy = 0.1)) +
labs(x = "qPCR: immune responses in distal gut (PC1 of PCA)", y = "Relative abundance",
colour = "Sample origin") +
scale_fill_brewer(palette = "Dark2") +
theme_bw() +
theme(strip.text = element_text(face = "italic"))
Make scatter plot for PC1 of barrier function relevant genes, color points by sample origin. Note that as the PC1 increases, the expression levels of barrier function relevant genes decrease (see PCA biplot above). Hence, a positive coefficient from the MaAsLin2 outputs is indicative of negative correlation between the microbial clades and expression levels of barrier function relevant genes, and vice versa.
filter(tab_sigAss, metadata == "qPCR_barrier_function" ) %>%
ggplot(aes(x = qPCR_barrier_function, y = abundance)) +
geom_point(aes(colour = Sample_origin), size = 2) +
geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 3, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)),
labels = scales::percent_format(accuracy = 0.1)) +
labs(x = "qPCR: barrier function in distal gut (PC1 of PCA)", y = "Relative abundance",
colour = "Sample origin") +
scale_fill_brewer(palette = "Dark2") +
theme_bw() +
theme(strip.text = element_text(face = "italic"))
Highlight sample origin effects
p_origin <- sigAss_origin %>%
filter(feature %in% c("Brevinema andersonii", "f__Spirochaetaceae")) %>%
# italize genus/species names only
#mutate(labs = ifelse(grepl("__", feature), paste0("bold(", feature, ")"), paste0("bolditalic(", feature, ")")),
#labs = gsub("\\s+", "~", labs)) %>%
#mutate(feature = factor(feature, labels = unique(labs))) %>%
ggplot(aes(x = Sample_origin, y = abundance)) +
geom_boxplot(outlier.shape = NA, width = 0.3) +
geom_jitter(aes(colour = Diet), shape = 16, position = position_jitter(0.1)) +
geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 2, scales = "free_y") + # set'labeller = label_parsed' if using regular expressions
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)),
breaks = 0:5*0.2, labels = scales::percent_format(accuracy = 1)) +
scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(3,4)]) +
labs(x = "Sample origin", y = "Relative abundance") +
theme_bw() +
theme(legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-5, -5, -5, -5),
strip.text = element_text(face = "bold.italic"))
p_origin
Highlight diet effects
p_diet <- sigAss_diet %>%
filter(feature %in% c("Bacillus", "Corynebacterium 1")) %>%
ggplot(aes(x = Diet, y = abundance)) +
geom_boxplot(outlier.shape = NA, width = 0.3) +
geom_jitter(aes(colour = Sample_origin), shape = 16, position = position_jitter(0.1)) +
geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, ncol = 2, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.6)),
labels = scales::percent_format(accuracy = 1)) +
scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(1,2)]) +
labs(color = "Sample origin", y = "Relative abundance") +
theme_bw() +
theme(strip.text = element_text(face = "bold.italic"),
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-5, -5, -5, -5))
p_diet
Highlight the significant association between Brevinema andersonii and the immune gene expression levels.
p_imm <- tab_sigAss %>%
filter(metadata == "qPCR_immune_response", feature == "Brevinema andersonii") %>%
ggplot(aes(x = qPCR_immune_response, y = abundance)) +
geom_point(aes(colour = Sample_origin), size = 2) +
geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.3)),
labels = scales::percent_format(accuracy = 1)) +
labs(x = "qPCR: immune responses (PC1 of PCA)", y = "Relative abundance",
colour = "Sample origin") +
scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(1,2)]) +
theme_bw() +
theme(strip.text = element_text(face = "bold.italic"),
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-5, -5, -5, -5))
p_imm
Highlight the significant association between the unclassified Spirochaetaceae and the barrier gene expression levels.
p_barr <- tab_sigAss %>%
filter(metadata == "qPCR_barrier_function", feature == "f__Spirochaetaceae") %>%
ggplot(aes(x = qPCR_barrier_function, y = abundance)) +
geom_point(aes(colour = Sample_origin), size = 2) +
geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
facet_wrap(~feature, scales = "free_y") +
scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.35)),
labels = scales::percent_format(accuracy = 1)) +
labs(x = "qPCR: barrier function (PC1 of PCA)", y = "Relative abundance",
colour = "Sample origin") +
scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(1,2)]) +
theme_bw() +
theme(strip.text = element_text(face = "bold.italic"),
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-5, -5, -5, -5))
p_barr
Assemble plots
# Assemble ggplots
ind <- plot_grid(p_origin, p_diet, p_imm, p_barr, align = "v", axis = "lr",
ncol = 1, labels = c("B", "C", "D", "E"))
# Convert the heatmap (made by the ComplexHeatmap) into a "grob" object
hmp_grob <- grid.grabExpr(draw(hmp, heatmap_legend_side = "left"))
# Combine the heatmap and ggplots
plot_grid(hmp_grob,
ind,
rel_widths = c(1, 1.45),
labels = c("A", ""),
ncol = 2)
# Export the assembled plot
ggsave(here("result", "figures", "Figure 5.tiff"), width = 10, height = 12,
units = "in", dpi = 300, compression = "lzw")
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nb_NO.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=nb_NO.UTF-8
## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=nb_NO.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=nb_NO.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ComplexHeatmap_2.2.0 circlize_0.4.8 Maaslin2_1.0.0 DT_0.13 EMAtools_0.1.3
## [6] lsr_0.5 lmerTest_3.1-1 lme4_1.1-21 Matrix_1.2-18 factoextra_1.0.6
## [11] phyloseq_1.30.0 MicrobeR_0.3.1 PerformanceAnalytics_2.0.4 xts_0.12-0 zoo_1.8-7
## [16] RColorBrewer_1.1-2 cowplot_1.0.0 forcats_0.5.0 stringr_1.4.0 dplyr_0.8.5
## [21] purrr_0.3.3 readr_1.3.1 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.3.0
## [26] tidyverse_1.3.0 here_0.1 knitr_1.28
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.0.0 RSQLite_2.2.0 htmlwidgets_1.5.1 munsell_0.5.0 codetools_0.2-16 effectsize_0.2.0 withr_2.1.2
## [8] colorspace_1.4-1 Biobase_2.46.0 rstudioapi_0.11 stats4_3.6.3 robustbase_0.93-6 ggsignif_0.6.0 labeling_0.3
## [15] emmeans_1.4.5 optparse_1.6.4 lpsymphony_1.14.0 pheatmap_1.0.12 bit64_0.9-7 farver_2.0.3 rhdf5_2.30.1
## [22] rprojroot_1.3-2 coda_0.19-3 vctrs_0.2.4 treeio_1.10.0 generics_0.0.2 TH.data_1.0-10 xfun_0.12
## [29] R6_2.4.1 clue_0.3-57 philr_1.12.0 assertthat_0.2.1 scales_1.1.0 multcomp_1.4-12 gtable_0.3.0
## [36] phangorn_2.5.5 sandwich_2.5-1 rlang_0.4.5 GlobalOptions_0.1.1 splines_3.6.3 lazyeval_0.2.2 broom_0.5.5
## [43] BiocManager_1.30.10 yaml_2.2.1 reshape2_1.4.3 modelr_0.1.6 crosstalk_1.1.0.1 backports_1.1.5 tools_3.6.3
## [50] logging_0.10-108 zCompositions_1.3.4 ellipsis_0.3.0 biomformat_1.14.0 BiocGenerics_0.32.0 hash_2.2.6.1 Rcpp_1.0.4
## [57] plyr_1.8.6 zlibbioc_1.32.0 ggpubr_0.2.5 pbapply_1.4-2 GetoptLong_0.1.8 S4Vectors_0.24.3 haven_2.2.0
## [64] ggrepel_0.8.2 cluster_2.1.0 fs_1.3.2 DECIPHER_2.14.0 magrittr_1.5 data.table_1.12.8 reprex_0.3.0
## [71] truncnorm_1.0-8 mvtnorm_1.1-0 sjmisc_2.8.3 hms_0.5.3 evaluate_0.14 xtable_1.8-4 pbkrtest_0.4-8.6
## [78] sjstats_0.17.9 readxl_1.3.1 IRanges_2.20.2 shape_1.4.4 compiler_3.6.3 crayon_1.3.4 minqa_1.2.4
## [85] htmltools_0.4.0 rtk_0.2.5.8 mgcv_1.8-31 pcaPP_1.9-73 DataCombine_0.2.21 lubridate_1.7.4 DBI_1.1.0
## [92] sjlabelled_1.1.3 biglm_0.9-1 dbplyr_1.4.2 MASS_7.3-51.5 boot_1.3-24 ade4_1.7-15 getopt_1.20.3
## [99] permute_0.9-5 cli_2.0.2 quadprog_1.5-8 parallel_3.6.3 insight_0.8.2 igraph_1.2.5 pkgconfig_2.0.3
## [106] rvcheck_0.1.8 numDeriv_2016.8-1.1 plotly_4.9.2 xml2_1.2.5 foreach_1.4.8 ggtree_2.0.2 picante_1.8.1
## [113] multtest_2.42.0 XVector_0.26.0 estimability_1.3 rvest_0.3.5 NADA_1.6-1.1 digest_0.6.25 parameters_0.6.0
## [120] vegan_2.5-6 Biostrings_2.54.0 rmarkdown_2.1 cellranger_1.1.0 fastmatch_1.1-0 tidytree_0.3.2 rjson_0.2.20
## [127] nloptr_1.2.2.1 lifecycle_0.2.0 nlme_3.1-144 jsonlite_1.6.1 Rhdf5lib_1.8.0 viridisLite_0.3.0 fansi_0.4.1
## [134] pillar_1.4.3 lattice_0.20-38 httr_1.4.1 DEoptimR_1.0-8 survival_3.1-8 glue_1.3.2 bayestestR_0.5.3
## [141] png_0.1-7 iterators_1.0.12 bit_1.1-15.2 stringi_1.4.6 performance_0.4.4 blob_1.2.1 memoise_1.1.0
## [148] ape_5.3